cd/entity/RTX 3090· home› entities› RTX 3090

grep -l @rtx 3090 /news/*.json | wc -l → 31

RTX 3090

mentions 31 type Person page 1/2 feed RSS

// recent coverage 31 mentions

00:23

2026-07-09

gilesthomas.com

artificial-intelligence

poppy the training box, part 1: the beginnings

A developer repurposed an old small-form-factor PC named 'poppy' into a dedicated machine for local LLM training, upgrading its case and power supply to accommodate future multi-GPU setups. The projec…

17:59

2026-07-08

gilesthomas.com

large-language-models

Writing an LLM from scratch, part 34b -- from bigrams to GPT-2, one component at a time (in JAX)

Giles Thomas completed building and training a GPT-2 small model from scratch using JAX, achieving a test loss of 3.418784, outperforming both his PyTorch model (3.538161) and the original GPT-2 small…

10:32

2026-07-04

dev.to

large-language-models

Solving the GPU Pinning Saga and Gemma's Meta-Commentary

Glad Labs fixed a GPU pinning issue where LiteLLM 1.89.2's global api_base override prevented per-model routing, causing vision tasks to cold-load onto the wrong GPU. The team also hardened content gu…

15:03

2026-07-03

github.com

large-language-models

Jamesob's guide to running SOTA LLMs locally

Jamesob published a guide on building a local system to run state-of-the-art large language models, detailing hardware configurations ranging from $2k to $40k. The setup uses multiple RTX GPUs and PCI…

16:37

2026-06-30

aws.amazon.com

artificial-intelligence

How Outpost VFX Uses AWS to Accelerate AI Model Training for Visual Effects

Outpost VFX achieved 8x faster AI model training for visual effects face replacement by using AWS multi-GPU EC2 P5 instances, overcoming single-GPU bottlenecks that previously caused week-long delays.…

13:00

2026-06-30

vettedconsumer.com

large-language-models

Bandwidth, Not TFLOPS: What Sets Your Local LLM Speed (and Why the Newest Card Isn't Always Fastest)

Owner-submitted benchmarks show that memory bandwidth, not TFLOPS, determines local LLM generation speed. An AMD RX 7900 XTX with 122 TFLOPS generates text at 39 tokens per second, while an older RTX …

11:14

2026-06-30

byteiota.com

large-language-models

Qwen3.6 MTP in llama.cpp: 27B Model Now 1.7x Faster

On May 16, 2026, llama.cpp merged Multi-Token Prediction (MTP) support, enabling 1.7x to 2.4x faster local inference for Qwen3.6 27B models with no accuracy loss or extra downloads. The MTP head is em…

07:58

2026-06-30

github.com

large-language-models

TurboPrefill: 2.7× faster than llama.cpp Pipeline Parallel on Llama-3-70B

TurboPrefill introduces intra-prompt pipeline scheduling for multi-GPU prefill, achieving up to 2.7× faster performance than llama.cpp on Llama-3-70B by overlapping GPU stage execution. The PoC shows …

21:24

2026-06-29

lesswrong.com

ai-safety

Role confusion: sounding like the cause is indistinguishable from being it.

A replication of the 2026 paper 'Prompt Injection as Role Confusion' on a single consumer GPU confirms that style-based prompt injection attacks work, but causal tests using activation steering and pa…

06:54

2026-06-28

dev.to

large-language-models

How to Run Reliable Local LLM Agents on an RTX 3090: A Benchmark (5 Models, Priced in Watts)

A developer benchmarked five local LLM agents on an RTX 3090, finding that the orchestrator, not the model, determines success. GLM-4.5-Air scored 0% with opencode but 93% with a LangGraph agent, whil…

01:03

2026-06-21

vettedconsumer.com

artificial-intelligence

Three RTX 3060s vs One RTX 3090 for Local AI: What a $1,500 Build Actually Measured

A $1,500 build using three used RTX 3060s (36GB VRAM) matched or outperformed a single RTX 3090 (24GB) in local AI benchmarks, achieving 18.2 tokens/second on Qwen 3.6 27B vs. 16.8 for the 3090, and 2…

21:24

2026-06-20

dev.to

large-language-models

I spent two weeks optimizing 96GB of VRAM for local LLMs. Paid APIs still won.

A developer spent two weeks optimizing a homelab with four RTX 3090s (96GB VRAM) for local LLM inference, achieving improvements like 40% throughput gain and 4x VRAM savings, but ultimately found that…

13:49

2026-06-18

blog.attacks.ai

ai-agents

We Got Anthropic's Glasswing at Home (Who Needs Mythos 5 or Fable 5?)

A security researcher built Lucent, an autonomous vulnerability-hunting pipeline inspired by Anthropic's Glasswing, using a local Qwen3.6-27B model on a single RTX 3090. The system found two real bugs…

09:02

2026-06-18

dev.to

developer-tools

How to Set Up a Local AI Coding Assistant in VS Code – Free & Private

A developer has published a guide for setting up a local AI coding assistant in VS Code using Continue and Ollama, achieving tab autocomplete and code chat entirely on-device. The setup requires a GPU…

14:16

2026-06-16

byteiota.com

large-language-models

Local LLMs vs Claude for Coding: The 70% Problem

A Hacker News thread on June 16 revealed that local LLMs like Qwen 3.6 35B-A3B handle about 70% of daily coding tasks but fall short on complex multi-file reasoning, creating a gap akin to a junior ve…

02:20

2026-06-16

discuss.huggingface.co

ai-agents

Assimetric parallel inference using consumer RTX PC

A user with a 24GB RTX 3090 and i5-10400 PC is experimenting with asymmetric parallel inference to reduce model looping and agent freezing, using their gaming PC as a platform to learn basic agentic A…

18:37

2026-06-15

discuss.huggingface.co

large-language-models

Unusual parallel inference using consumer RTX rig

A technical report proposes using a consumer RTX 3090's integrated GPU (iGPU) to run a small language model as a 'Sentinel' for monitoring and validating outputs from the primary GPU-bound model. The …

00:00

2026-06-15

glukhov.org

large-language-models

Cost Optimization for LLM Systems: Where the Money Actually Goes

LLM costs scale linearly with usage, and enterprises spending over $10,000 annually can optimize by implementing token budgets, choosing between API and local inference, and using fallback strategies.…

11:39

2026-06-14

runtimewire.com

artificial-intelligence

Pearl's AI mining pitch faces a 112 MW usefulness test

Pearl Research Labs claims to have built a live GPU-secured blockchain without proving the work is useful AI computation, facing a 112 MW usefulness test. A preprint estimates Pearl's network runs at …

05:11

2026-06-14

news.ycombinator.com

ai-startups

Story of How Im Running an Unlimited $6/Month AI Provider on 4x RTX 3090s

A developer launched an unlimited LLM provider for $6/month using 4x RTX 3090s after a failed launch on AMD MI300x hardware. Despite initial bugs and hardware issues, the service gained traction with …

page 1 / 2 next →

// co-occurs with top 8 entities

NVIDIA 10 llama.cpp 7 RTX 5080 5 RTX 4090 5 Gemma 4 Qwen 3 Hugging Face 3 Ollama 3